Skip to content

refactor(store,cache)!: optimize cache restore as O(1)#3134

Merged
julienrbrt merged 13 commits intomainfrom
julien/cache
Mar 6, 2026
Merged

refactor(store,cache)!: optimize cache restore as O(1)#3134
julienrbrt merged 13 commits intomainfrom
julien/cache

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Mar 4, 2026

Overview

Optimize cache restore as O(n) + prep v1.0.0.

Summary by CodeRabbit

  • New Features

    • v1.0.0: snapshot-based persisted in-flight state and improved restart recovery; node pruning with configurable modes; DA header/data inclusion retrieval by height.
  • Bug Fixes

    • Improved DA inclusion persistence across restarts and height-based fallbacks when hashes are missing; safer timer clamp to avoid negative durations.
  • Tests

    • Expanded restart/snapshot and DA-inclusion test coverage to validate placeholder and restoration behavior.
  • Chores

    • Removed metadata-by-prefix API and related mock/test support.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Warning

Rate limit exceeded

@julienrbrt has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d0158322-0af9-4e1f-85f5-c92a608095c3

📥 Commits

Reviewing files that changed from the base of the PR and between 3834cfd and e668b57.

📒 Files selected for processing (1)
  • block/internal/cache/generic_cache.go
📝 Walkthrough

Walkthrough

Adds a persisted in-flight DA snapshot mechanism to the block cache, extends cache manager APIs for height-based DA lookups, adds restart fallback logic, removes store prefix-metadata querying (and related types/tests), and expands tests covering snapshot restore and restart scenarios.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Adds v1.0.0 release notes describing disaster recovery, DA-only block catch-up on restart, DA/P2P verification, and pruning features.
Cache: snapshot persistence & API
block/internal/cache/generic_cache.go
Implements consolidated persisted snapshot for in‑flight entries, adds encode/decode and height→hash index, public helpers HeightPlaceholderKey, SaveToStore and ClearFromStore; replaces per-entry store writes with single snapshot key approach.
Cache tests
block/internal/cache/generic_cache_test.go
Large test expansion and refactor: adds writeSnapshot helper and many restore/restart scenarios validating snapshot round-trip, placeholder handling, and DA height recovery.
Cache manager: API & restart init
block/internal/cache/manager.go
Promotes DA-inclusion prefixes to exported constants, extends CacheManager with GetHeaderDAIncludedByHeight and GetDataDAIncludedByHeight, adds init routine to seed max DA heights from store metadata, and adapts save/restore/clear semantics.
Manager tests
block/internal/cache/manager_test.go
Updates tests to use pkgstore alias, simulates height→DA metadata, and adjusts assertions to reflect snapshot vs finalized-tip restoration semantics.
Submitter: height fallbacks
block/internal/submitting/submitter.go
Adds height-based fallback lookups in setNodeHeightToDAHeight and IsHeightDAIncluded when hash-based lookups miss (e.g., post-restart with placeholders); improves error messages.
Submitter tests
block/internal/submitting/submitter_test.go
Adds tests verifying IsHeightDAIncluded and setNodeHeightToDAHeight behavior across restart with placeholder snapshots and subsequent real-hash re-firing.
Store API removal & related
pkg/store/store.go, pkg/store/types.go, pkg/store/tracing.go, pkg/store/store_test.go, pkg/store/tracing_test.go, test/mocks/store.go
Removes GetMetadataByPrefix from DefaultStore and Metadata interface, deletes MetadataEntry type, removes tracing/mock support and two related tests — eliminating prefix-based metadata query capability.
Executor timer clamp
block/internal/executing/executor.go
Replaces negative clamp with non‑negative clamp (use max(..., 0)) when resetting block timer.
Sequencer tests style
pkg/sequencers/single/sequencer_test.go
Minor stylistic loop changes in tests (range-based loops).
Node logging
node/full.go
Adjusts aggregator/sync startup log messages text for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Node
    participant CacheManager
    participant Cache
    participant Store
    participant DARetriever

    Node->>CacheManager: RestoreFromStore()
    activate CacheManager
    CacheManager->>Store: Read snapshotKey
    Store-->>CacheManager: snapshot (or empty)
    alt snapshot present
        CacheManager->>Cache: Restore in-flight height→DA entries (placeholders)
    end
    CacheManager->>Store: Read finalized-tip metadata (HeightToDAHeight)
    Store-->>CacheManager: max DA height metadata
    CacheManager->>Cache: Initialize maxDAHeight
    deactivate CacheManager

    DARetriever->>CacheManager: setNodeHeightToDAHeight(height, hash)
    activate CacheManager
    alt hash present in cache
        CacheManager->>Cache: lookup by hash → daHeight
    else hash missing (post-restart)
        CacheManager->>Cache: HeightPlaceholderKey lookup → placeholder daHeight
    end
    CacheManager->>Store: Persist HeightToDAHeight metadata
    deactivate CacheManager

    Node->>CacheManager: SaveToStore()
    activate CacheManager
    CacheManager->>Cache: Encode current snapshot
    Cache-->>CacheManager: snapshot bytes
    CacheManager->>Store: Write snapshotKey
    deactivate CacheManager
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

T:testing, T:code-hygiene

Suggested reviewers

  • alpe
  • chatton
  • tuxcanfly

Poem

🐰 Snapshots stored in gentle rows,
Heights and hashes find repose,
On restart I hop, I check, I mend,
Placeholders hold until real hashes send,
DA paths restored — hop back again! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides minimal context beyond restating the title. It lacks detailed explanation of the rationale, benefits, technical approach, and any associated issues or breaking changes. Expand the description to explain the optimization approach, performance improvements, reasons for the refactor, breaking changes, and any related issues or context for v1.0.0 preparation.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: optimizing cache restore performance and the breaking changes to store/cache (indicated by the ! in semantic commit convention).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch julien/cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 6, 2026, 9:57 AM

@claude
Copy link
Contributor

claude bot commented Mar 4, 2026

Claude finished @julienrbrt's task in 2m 15s —— View job


Code Review: refactor(store,cache)!: optimize cache restore as O(1)

  • Gather context and understand the PR
  • Review cache snapshot implementation (generic_cache.go)
  • Review cache manager changes (manager.go)
  • Review submitter changes (submitter.go)
  • Review store changes and test coverage
  • Review test quality and coverage
  • Compile findings and post review

Summary

This is a well-motivated refactor. The core idea—replacing a per-entry store scan with a single consolidated snapshot key—is correct and clean. The placeholder-based restore mechanism elegantly handles the restart window before the DA retriever can re-fire real content hashes. The test suite is comprehensive. Below are findings from most to least critical.


Issues

1. decodeSnapshot silently masks data corruption — generic_cache.go:245-256

When the buffer is non-empty but not a multiple of snapshotEntrySize (e.g. due to partial write or bitflip), decodeSnapshot returns nil, causing RestoreFromStore to behave as if the snapshot were empty. The node silently loses all in-flight DA state rather than surfacing the corruption.

// Current — bad: treats corrupt data as "nothing to restore"
func decodeSnapshot(buf []byte) []snapshotEntry {
    if len(buf) == 0 || len(buf)%snapshotEntrySize != 0 {
        return nil
    }
    ...
}

The fix (already suggested by CodeRabbit) is to return an error for the malformed case:

func decodeSnapshot(buf []byte) ([]snapshotEntry, error) {
    if len(buf) == 0 {
        return nil, nil
    }
    if len(buf)%snapshotEntrySize != 0 {
        return nil, fmt.Errorf("malformed snapshot: len=%d, expected multiple of %d", len(buf), snapshotEntrySize)
    }
    ...
    return entries, nil
}

RestoreFromStore should then propagate the error rather than continue with no state. Fix this →


2. persistSnapshot called with context.Background()generic_cache.go:154,161,204

func (c *Cache[T]) setDAIncluded(hash string, daHeight uint64, blockHeight uint64) {
    ...
    _ = c.persistSnapshot(context.Background()) // bypasses caller's context
}

The caller context is ignored, so if the store becomes unavailable the write hangs until context.Background() is satisfied (i.e., never). The _ = silently discards the error too. This is intentional for best-effort persistence, but the "silently discard" pattern means store failures during mutation are invisible. Consider at least logging a warning:

if err := c.persistSnapshot(context.Background()); err != nil {
    // log.Warn or handle; at minimum, don't fully silently discard
}

3. Test helper has unused parameters — manager_test.go:117

writeHeightDAMeta := func(height, daH uint64, _, _ string) {
    ...
}
writeHeightDAMeta(1, 100, "", "")
writeHeightDAMeta(2, 101, "", "")

The two string parameters are never used. This compiles but is misleading and will fail golangci-lint's unparam or revive checks. Should be:

writeHeightDAMeta := func(height, daH uint64) { ... }
writeHeightDAMeta(1, 100)
writeHeightDAMeta(2, 101)

Fix this →


Observations & Nits

Concurrency in setMaxDAHeightgeneric_cache.go:171-180

func (c *Cache[T]) setMaxDAHeight(daHeight uint64) {
    for range 1_000 {
        ...
        if c.maxDAHeight.CompareAndSwap(current, daHeight) {
            return
        }
    }
}

The 1_000-iteration CAS retry is unusual. Under extreme pathological contention (1000 concurrent writers all racing on this), the update would silently be dropped. In practice this is fine—DA heights advance slowly—but the magic number deserves a comment explaining why it was chosen and what happens if it's exceeded.


initDAHeightFromStore reads store metadata but ignores errors — manager.go:393-409

func (m *implementation) initDAHeightFromStore(ctx context.Context) {
    daIncludedBytes, err := m.store.GetMetadata(ctx, store.DAIncludedHeightKey)
    if err != nil || len(daIncludedBytes) != 8 {
        return
    }
    ...
}

This is intentional best-effort seeding (no error return), but a store read failure on startup is silently swallowed, leaving DaHeight() at zero and potentially causing the syncer to re-scan from genesis. This may be an acceptable trade-off, but the comment block would benefit from explicitly documenting this: "if DAIncludedHeightKey is absent or unreadable, DaHeight() starts at 0 and the syncer will catch up."


ClearFromStore rebuilds all 4 caches — manager.go:379-383

m.headerCache = NewCache[types.SignedHeader](m.store, HeaderDAIncludedPrefix)
m.dataCache = NewCache[types.Data](m.store, DataDAIncludedPrefix)
m.txCache = NewCache[struct{}](nil,)
m.pendingEventsCache = NewCache[common.DAHeightEvent](nil,)

This is correct but worth verifying there are no concurrent readers holding stale references to the old headerCache/dataCache at the time ClearFromStore is called. Since NewManager invokes this during initialization (before goroutines start), it's safe—but a comment noting that ClearFromStore is only safe to call before the cache is shared would prevent future misuse.


IsHeightDAIncluded / setNodeHeightToDAHeight already use height-based lookup as primary — submitter.go:435,499

Both methods now call GetHeaderDAIncludedByHeight / GetDataDAIncludedByHeight directly (not as a fallback after a hash-based miss). This correctly addresses alpe's comment from the review thread.


What's Good

  • Architecture: The single-snapshot-key approach is significantly cleaner than the previous per-entry key model. The O(1) restore claim holds for the snapshot read itself; the O(k) decode (k = in-flight window) is bounded and tiny.
  • Placeholder eviction: The orphan-prevention logic in setDAIncluded (evicting the old placeholder when the real hash arrives) is correct and well-tested.
  • Error handling: Previous versions silently dropped SaveToStore/ClearFromStore errors; this PR propagates them correctly.
  • RestoreFromStore error discrimination: Correctly treats ds.ErrNotFound as a no-op and propagates other errors.
  • Test coverage: TestCache_RestartIdempotent, TestCache_NoPlaceholderLeakAfterRefire, TestSubmitter_IsHeightDAIncluded_AfterRestart, and TestSubmitter_setNodeHeightToDAHeight_AfterRestart are particularly valuable regression tests.
  • Store cleanup: Removing GetMetadataByPrefix is a good simplification—the prefix scan was replaced by more targeted lookups.
  • Timer fix in executor: max(..., 0) instead of negative clamp is correct.

Required Before Merge

# File Line Severity
1 generic_cache.go 245 ⚠️ Medium — corrupt snapshot data silently treated as empty
2 generic_cache.go 154,161,204 ℹ️ Low — store errors on mutation silently dropped
3 manager_test.go 117 🔧 Lint — unused function parameters

@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 85.84071% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.12%. Comparing base (b1e3010) to head (007ac1c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/cache/generic_cache.go 88.73% 4 Missing and 4 partials ⚠️
block/internal/cache/manager.go 78.26% 3 Missing and 2 partials ⚠️
block/internal/syncing/syncer.go 50.00% 2 Missing ⚠️
block/internal/submitting/submitter.go 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3134      +/-   ##
==========================================
+ Coverage   61.00%   61.12%   +0.12%     
==========================================
  Files         113      113              
  Lines       11700    11676      -24     
==========================================
  Hits         7137     7137              
+ Misses       3753     3736      -17     
+ Partials      810      803       -7     
Flag Coverage Δ
combined 61.12% <85.84%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt changed the title refactor(store,cache)!: optimize cache restore as O(n) refactor(store,cache)!: optimize cache restore as O(1) Mar 4, 2026
@julienrbrt julienrbrt marked this pull request as ready for review March 5, 2026 12:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
block/internal/cache/manager_test.go (2)

459-462: The O(1) restore claim is documented but not asserted.

Line 459-Line 462 describe a single metadata read, but this test currently only validates restored state. Consider an instrumented/mock store to assert metadata-read count during NewManager so this performance contract does not regress.

As per coding guidelines, "Mock external dependencies using mockery".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/manager_test.go` around lines 459 - 462, The test claims
O(1) restore but doesn't assert metadata read count; update the test around the
NewManager call to use a mock store (generated with mockery for the store
interface) that tracks GetMetadata/Get calls and assert exactly one metadata
read during NewManager(cfg, st, zerolog.Nop()). Specifically, replace the plain
store instance with the mock store, set expectations/return values for the
metadata read methods used by Manager (e.g., GetMetadata/GetMetadataKey/Get),
call SaveToStore beforehand to create the snapshot, then create m2 via
NewManager and assert the mock's metadata-read counter or mock expectation was
hit exactly once to enforce the O(1) contract.

117-125: Remove unused placeholder parameters from writeHeightDAMeta.

The two trailing string parameters are never used and are always passed as empty literals. Removing them keeps the helper function focused and aligns with Go best practices for meaningful parameter names.

♻️ Suggested cleanup
-	writeHeightDAMeta := func(height, daH uint64, _, _ string) {
+	writeHeightDAMeta := func(height, daH uint64) {
 		b := make([]byte, 8)
 		binary.LittleEndian.PutUint64(b, daH)
 		require.NoError(t, st.SetMetadata(ctx, pkgstore.GetHeightToDAHeightHeaderKey(height), b))
 		binary.LittleEndian.PutUint64(b, daH)
 		require.NoError(t, st.SetMetadata(ctx, pkgstore.GetHeightToDAHeightDataKey(height), b))
 	}
-	writeHeightDAMeta(1, 100, "", "")
-	writeHeightDAMeta(2, 101, "", "")
+	writeHeightDAMeta(1, 100)
+	writeHeightDAMeta(2, 101)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/manager_test.go` around lines 117 - 125, The helper
function writeHeightDAMeta includes two unused trailing string parameters;
remove those parameters from its signature and all calls to it (change
writeHeightDAMeta := func(height, daH uint64, _, _ string) { ... } to
writeHeightDAMeta := func(height, daH uint64) { ... } and update calls
writeHeightDAMeta(1, 100, "", "") and writeHeightDAMeta(2, 101, "", "") to
writeHeightDAMeta(1, 100) and writeHeightDAMeta(2, 101)); keep the body
unchanged (it should still call st.SetMetadata with
pkgstore.GetHeightToDAHeightHeaderKey and pkgstore.GetHeightToDAHeightDataKey)
so tests compile and behavior is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@block/internal/cache/generic_cache_test.go`:
- Around line 320-332: TestCache_ClearFromStore is asserting removal of
"clear-test/hash1" but the cache only writes the snapshot key
"clear-test/__snap", so the test can pass while ClearFromStore is broken; update
the test to check for removal of the snapshot key instead. Specifically, in
TestCache_ClearFromStore (which uses testMemStore, NewCache[testItem],
c.setDAIncluded and c.ClearFromStore(ctx)), replace the GetMetadata assertion
for "clear-test/hash1" with an assertion that "clear-test/__snap" is missing (or
verify st.GetMetadata returns an error for "clear-test/__snap"), ensuring the
test fails if ClearFromStore does not delete the snapshot key.

In `@block/internal/cache/generic_cache.go`:
- Around line 228-229: The calls to the backing store (e.g.,
c.store.SetMetadata, c.store.Set, c.store.Clear) in SaveToStore and
ClearFromStore currently ignore errors and always return nil; change those sites
to capture the returned error, return or wrap it from the enclosing methods
(SaveToStore, ClearFromStore) so callers see persistence failures, and ensure
snapshotKey()/encodeSnapshot(entries) invocation errors (if any) are also
surfaced; update all instances noted (the SetMetadata call around
snapshotKey/encodeSnapshot and the Set/Clear calls in the blocks corresponding
to lines ~297-303 and ~306-311) to check err != nil and return that error (or a
wrapped error with context).
- Around line 266-269: The restore path currently treats any error from
c.store.GetMetadata(ctx, c.snapshotKey()) as a benign "nothing to restore";
change this to only ignore a not-found error and propagate all other errors with
context. Update the code around c.store.GetMetadata(...) (in the restore logic
that calls c.snapshotKey()) to check the returned error type/value (e.g.,
storage.ErrNotFound or equivalent) and return nil for that case only; for any
other err, wrap it with a contextual message (mentioning snapshotKey or
GetMetadata) and return it so real storage/read failures are not suppressed.

---

Nitpick comments:
In `@block/internal/cache/manager_test.go`:
- Around line 459-462: The test claims O(1) restore but doesn't assert metadata
read count; update the test around the NewManager call to use a mock store
(generated with mockery for the store interface) that tracks GetMetadata/Get
calls and assert exactly one metadata read during NewManager(cfg, st,
zerolog.Nop()). Specifically, replace the plain store instance with the mock
store, set expectations/return values for the metadata read methods used by
Manager (e.g., GetMetadata/GetMetadataKey/Get), call SaveToStore beforehand to
create the snapshot, then create m2 via NewManager and assert the mock's
metadata-read counter or mock expectation was hit exactly once to enforce the
O(1) contract.
- Around line 117-125: The helper function writeHeightDAMeta includes two unused
trailing string parameters; remove those parameters from its signature and all
calls to it (change writeHeightDAMeta := func(height, daH uint64, _, _ string) {
... } to writeHeightDAMeta := func(height, daH uint64) { ... } and update calls
writeHeightDAMeta(1, 100, "", "") and writeHeightDAMeta(2, 101, "", "") to
writeHeightDAMeta(1, 100) and writeHeightDAMeta(2, 101)); keep the body
unchanged (it should still call st.SetMetadata with
pkgstore.GetHeightToDAHeightHeaderKey and pkgstore.GetHeightToDAHeightDataKey)
so tests compile and behavior is preserved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65ba1b1a-6fb8-4530-8f80-54d0c766c4ab

📥 Commits

Reviewing files that changed from the base of the PR and between f0eba0d and fd73169.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • block/internal/cache/generic_cache.go
  • block/internal/cache/generic_cache_test.go
  • block/internal/cache/manager.go
  • block/internal/cache/manager_test.go
  • block/internal/submitting/submitter.go
  • block/internal/submitting/submitter_test.go
  • node/full.go
  • pkg/store/store.go
  • pkg/store/store_test.go
  • pkg/store/tracing.go
  • pkg/store/tracing_test.go
  • pkg/store/types.go
  • test/mocks/store.go
💤 Files with no reviewable changes (5)
  • pkg/store/store_test.go
  • pkg/store/tracing.go
  • pkg/store/types.go
  • pkg/store/store.go
  • test/mocks/store.go

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job, love seeing the store cleanup

@julienrbrt julienrbrt enabled auto-merge March 6, 2026 09:00
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
block/internal/cache/generic_cache.go (1)

245-258: Silent nil return for malformed snapshot data masks corruption.

decodeSnapshot returns nil when the buffer length isn't a multiple of snapshotEntrySize. This silently treats corrupted data the same as an empty snapshot, making data corruption hard to detect.

Consider returning an error for the malformed case (non-empty but invalid length) to surface corruption during debugging.

♻️ Suggested improvement
-// decodeSnapshot deserialises a byte slice produced by encodeSnapshot.
-// Returns nil if the slice is empty or not a multiple of snapshotEntrySize.
-func decodeSnapshot(buf []byte) []snapshotEntry {
-	if len(buf) == 0 || len(buf)%snapshotEntrySize != 0 {
-		return nil
+// decodeSnapshot deserialises a byte slice produced by encodeSnapshot.
+// Returns nil, nil for empty buffer; returns error if buffer is malformed.
+func decodeSnapshot(buf []byte) ([]snapshotEntry, error) {
+	if len(buf) == 0 {
+		return nil, nil
+	}
+	if len(buf)%snapshotEntrySize != 0 {
+		return nil, fmt.Errorf("malformed snapshot: length %d not a multiple of %d", len(buf), snapshotEntrySize)
 	}
 	entries := make([]snapshotEntry, len(buf)/snapshotEntrySize)
 	for i := range entries {
 		off := i * snapshotEntrySize
 		entries[i].blockHeight = binary.LittleEndian.Uint64(buf[off:])
 		entries[i].daHeight = binary.LittleEndian.Uint64(buf[off+8:])
 	}
-	return entries
+	return entries, nil
 }

This would require updating RestoreFromStore to handle the error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@block/internal/cache/generic_cache.go` around lines 245 - 258, decodeSnapshot
currently returns nil for malformed buffers which hides corruption; change its
signature to return ([]snapshotEntry, error) and have it return an explicit
error when buf is non-empty but len(buf)%snapshotEntrySize != 0, keeping the
successful path returning entries and nil error; update any callers (notably
RestoreFromStore) to handle the error (propagate or surface it) and adjust any
tests/type usages accordingly; reference symbols: decodeSnapshot, snapshotEntry,
snapshotEntrySize, and RestoreFromStore.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@block/internal/cache/generic_cache.go`:
- Around line 245-258: decodeSnapshot currently returns nil for malformed
buffers which hides corruption; change its signature to return ([]snapshotEntry,
error) and have it return an explicit error when buf is non-empty but
len(buf)%snapshotEntrySize != 0, keeping the successful path returning entries
and nil error; update any callers (notably RestoreFromStore) to handle the error
(propagate or surface it) and adjust any tests/type usages accordingly;
reference symbols: decodeSnapshot, snapshotEntry, snapshotEntrySize, and
RestoreFromStore.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa4d0446-cbc3-44ac-b69c-2d22a691f033

📥 Commits

Reviewing files that changed from the base of the PR and between fd73169 and 3834cfd.

📒 Files selected for processing (4)
  • block/internal/cache/generic_cache.go
  • block/internal/cache/generic_cache_test.go
  • block/internal/executing/executor.go
  • pkg/sequencers/single/sequencer_test.go

Copy link
Contributor

@alpe alpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart solution. 🥇

entries, err := c.store.GetMetadataByPrefix(ctx, c.storeKeyPrefix)
if err != nil {
return fmt.Errorf("failed to query metadata by prefix %q: %w", c.storeKeyPrefix, err)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: store and storeKeyPrefix must not be empty in normal use. why not move this guard to the constructor to fail fast?

daHeightForHeader, ok := s.cache.GetHeaderDAIncluded(headerHash.String())
if !ok {
return fmt.Errorf("header hash %s not found in cache", headerHash)
daHeightForHeader, ok = s.cache.GetHeaderDAIncludedByHeight(height)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and L 458: why use this as fallback instead of default? you would save the headerHash time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i will simplify

@julienrbrt julienrbrt disabled auto-merge March 6, 2026 09:30
@julienrbrt julienrbrt enabled auto-merge March 6, 2026 09:57
@julienrbrt julienrbrt added this pull request to the merge queue Mar 6, 2026
Merged via the queue into main with commit a96974b Mar 6, 2026
31 of 32 checks passed
@julienrbrt julienrbrt deleted the julien/cache branch March 6, 2026 10:30
alpe added a commit that referenced this pull request Mar 9, 2026
* main:
  chore: minor deduplication (#3139)
  refactor(store,cache)!: optimize cache restore as O(1) (#3134)
  ci: remove spamoor results from benchmark results per PR (#3138)
  build(deps): Bump rollup from 4.22.4 to 4.59.0 in /docs in the npm_and_yarn group across 1 directory (#3136)
  fix(block): fix blocktime timer usage (#3137)
  build(deps): bump core v1.0.0 (#3135)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants